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REFERENCE TO COMPUTER PROGRAM LISTING 

[0001] This patent document includes a computer program Hsting apper dix that was 
submitted on a compact disc containing the following files: BasicDef.h (2K B); Cameraxpp 
(1 1KB); Camera.h (4KB); Segmentxpp (258KB); Segments (9KB); Texhri age.cpp (8KB); 
Texlmage.h (2KB); TexMap.cpp (11KB); TexMap.h (2KB); TexMesh.cpp -i 43KB); TexMesh.h 
(7KB); tmLine2.ccp (12KB); tmLine2,h (2KB); tmTri3Area.ccp (23KB); tn Tri3Area.h (4KB); 
tmTriangle2.ccp (9KB); tmTriangle2.h (2KB); VolumeWP.cpp (27KB); an( VolumeWP.h 
(3KB), all create on November 9, 2001. The material on the compact disc it hereby incorporated 
by reference herein in its entirety. 

[0002] A portion of the disclosure of this patent document contains mat* rial which is subject 

to copyright protection. The copyright owner has no objection to the facsim le reproduction by 
anyone of the patent document or the patent disclosure, as it appears in the I atent and Trademark 
Office patent file or records, but otherwise reserves all copyright rights whai soever. 

BACKGROUND 

[0003] Three-dimensional visual representations and panoramic views o f objects have 
become increasingly important and useful in entertainment, commerce, and < ducation. In 
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movies and animation, for example, panoramic views can add interesting sj ecial effects such as 
rotation about stop-action scenes. In games and virtual reality applications, three-dimensional 
representations of objects permit changes in the views of the objects accord ng to the virtual 
movement of a user. In electronic commerce, a web site can use three-dimensional 
representations of merchandise to allow a costumer to view the merchandise from any desired 
perspective. Three-dimensional representations of works of art, exhibits, ard antiques can be 
similarly viewed for commercial or educational purposes, and three-dimens onal or panoramic 
views of objects can aid scientists and engineers during research and development of new 
technologies or products. 

[0004] One technique for creating a panoramic view of an object is som ?times referred to as 
2.5-dimensional modeling. A 2.5-dimensional model of an object generall)- includes as series of 
images of the object from different perspectives. Taking these images gene ally requires a 
precision arrangement of cameras that photograph the object from the requii ed perspectives. If 
enough images are used, the images can be shown in sequence to provide sri looth apparent 
movement of a camera around the object. The 2.5-dimensional techniques } ave been effectively 
xised, for example, in movies to allow a change in camera mgle during stop iction or slow 
motion fihning. However, a 2.5-dimensional model of an object only provi(ies specific views of 
the object and may be unable to provide some of the desired views of the ob ect. 

[0005] A ftiU three dimensional model of an object describes the surface of the object in 
three dimensions and permits rendering of any desired views of the object, rleconstruction of a 
three dimensional model of an object has generally required several images < >f an object with 
each image having a known perspective (i.e., a known orientation and locati m of the camera 
relative to the object). The known orientations and locations of the cameras allow determination 
of projection matrices for the images. The projection matrices (or inverses ( f the projection 
matrices) allow determination of the three-dimensional coordinates of point on the object from 
the locations of the points in the images. The surface of the object can then )e represented using 
polygons with the vertices of each polygon having three-dimensional coordi lates calculated 
from the positions of the vertices in the images. 
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[0006] To avoid the complexity and expense of camera or turntable sysH ^ms that provide 
carefully measured camera orientations, efforts have been made to construci three-dimensional 
models based on series of unmeasured images, i.e., images where camera pi rameters such as the 
orientation and location of the camera relative to the object are unknown. \ '^olfgang Niem and 
Jochen Wingbermuhle, ^'Automatic Reconstruction of 3D Objects Using a A hbile Monoscopic 
Camerd\ Proceedings of the Intemational Conference on Recent Advances in 3-D Digital 
Imaging and ModeUng, IEEE (1997) describes a camera calibration technique using a known 
radially symmetric background pattern that is photographed with the object, With this technique, 
circles surrounding the object appear elliptical in the images, and the earner, parameters can be 
determined from measurements of ellipse in the images. This technique cat encounter 
difficulties in identifying thin lines corresponding to the circles in the backg *ound, particularly 
since the object generally blocks the view of a portion of the each surroundi ig circle. 
Additionally, calculations for such systems are best performed in radial cooi dinates, which can 
increase complexity and required processing power. 

[0007] Other modeling techniques are known for images that do not cor tain a known 
background. For example, M. Pollefeys, R. Koch and L. Gool, ''Self-C ilibration and 
Metric Reconstruction in spite of Varying and Unknown Internal Camera P< irameters^^ 
Mtemational Joumal of Computer Vision, 32(1), 7-25, 1999 provides metho ds for calibrating 
images for camera parameters without using a known pattern in the images. R. Koch, M. 
Pollefeys, L. Van Gool, "Realistic surface reconstruction of 3D scenes from uncalibrated image 
sequences,'^ Joumal Visualization and Computer Animation, Vol. 11, pp. 11 5-127, 2000 and M. 
Pollefeys, R. Koch, M. Vergauwen, L. Van Gool, ''Automated reconstructio i of 3D scenes from 
sequences of images,'' Isprs Joumal Of Photogrammetry And Remote Sensii g (55)4 (2000) pp. 
251-267 describe methods for reconstruction of a three-dimensional model c. f a selected portion 
of architectural stmcture. An article entitled "5Z) Model Acquisition from E tended Image 
Sequences'", Proc. 4*^ European Conference on Computer Vision, LNCS 10( 5, Cambridge, pages 
687-695 (1996) by Paul Bradley, Phillip Torr, and Andrew Zisserman descn^e methods for 
constructing a three-dimensional model from extended image sequences sue i as from a 
camcorder. 
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[0008] The prior systems for generating 3D models from uncalibrated mages have typically 
been limited in the angular portion of an object represented or have required a large amount of 
processing power to implement. Accordingly, ability to construct a full 3E model of an object 
has been out of reach for most users. Simpler and less expensive systems f,nd methods for 
generating three-dimensional models and/or panoramic views are desired. 

SUMMARY 

[0009] In accordance with an aspect of the invention, a system or meth .d for constructing a 
three-dimensional model and/or a panoramic view of an object does not rec uire a complex 
moimting or measurement system for cameras. Instead, a user can take ims ges without 
measuring or knowing the positions and orientations of the camera. Softwt, re implementing a 
reconstruction engine identifies features in the images and from the shapes, sizes, and locations 
of the features in the images calibrates the images according to the perspect ve of the camera or 
cameras taking the images. The reconstruction engine thus can determine a three-dimensional 
model of the object in the images and can also perform image-based render ag, which generates 
desired views of the object from the three-dimensional model. 

[0010] One embodiment of the invention uses a background havmg sep irated marks with 
known dimensions and positions. Photographs from several perspectives ar ^ taken of the object 
with the background being visible in each photographed image. The recons ruction engine uses 
the known dimensions of the marks in the background to calibrate the images for camera 
positions. Since the dimensions of the background pattern are known, the r(x;onstraction engine 
can determine a transformation including a projection matrix for each imagi without reference to 
the other images and without measurement of a camera's position or orienta ion. The 
information known about the pattern reduces the required processing power or time required to 
determine camera parameters for the images. 

[001 1] In one embodiment, each mark in llie background pattern includes one or more 
rectangular segments with the rectangular segments having known proportions and relative 
positions in the pattern. For example, each mark can include two rectangula • segments that 



-4- 



ARC002US 



when combined give the mark an L-shape. Each L-shaped mark provides t ix comer points 
having known 3D coordinates, and the known 3D coordinates and the measure coordinates of the 
comer points in an image indicate a transformation between the 3D coordii ates and the image 
coordinates. The known background coloring and mark shapes simplify th ; process of 
distinguishing pixels in an image that correspond to the background from p ixels that are part of a 
silhouette of the object. Additionally, the marks are positioned so that eacl image will generally 
include one or more marks that are separate from the object and therefore easier to identify and 
measure. Further, when rectangular segments are employed, determination ■; of projection 
matrices or transforms can be performed m Cartesian coordinates. 

[00121 Volume generation techniques using the determined transforms aid object silhouettes 
can constmct a dense set of 3D points that define an approximation of the \ Dlume of the object, 
hi particular, applying one or more determined transform to a dense set of p oints within a 
volume transforms 3D coordinates to 2D coordinates in an unage or images corresponding to the 
one or more transforms. The set of 3D points that transform to points in tht silhouette or 
silhouettes are sometimes referred to herein as the volume points because th ose points are within 
an approximation of the volume of the object. 

[0013] A surface reconstruction process uses the volume points or a spa se sampling of the 
volume points to construct a polygon mesh representing the surface of the o )ject. Surface 
reconstmction processes that are known for generating a surface from unorg uiized points can 
use the volume points, and information found in the volume generation proc sss can reduce the 
processing power requfred for the surface reconstmction. 

[0014] hi accordance with another aspect of the invention, a texturing pi jcess adds texture or 
coloring from the images to the polygons (e.g., triangles) in the mesh. The t jxtuiing process 
constmcts for each polygon an ordered Ust of unages that are candidates for )roviding texture to 
the polygon. GeneraUy, the Ust for a polygon ranks the candidate unages ac< ording to the 
direction of the camera axis for the unage relative to the dfrection of the veci or normal to the 
surface of the polygon, and the hst particularly identifies a best image and gc od images for 
providing texture to the polygon. Candidate unages are ehmmated from the ist for a polygon if 
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the projection of the polygon onto the image extends outside the object's s Ihouette in the image 
or if another part of the object at least partially obscures the view of the pr( jection in the image. 

[0015] To improve the consistency and appearance of the texture in tht 3D model, contrast 
in the images can be adjusted to provide smooth transitions from each image to the images 
corresponding to adjacent views of the object. Additionally, the polygon mesh is partitioned into 
regions of contiguous polygons where all polygons in a region have the san e image as the best 
source for texture. Any small regions can have theirs source image changei I to that of a larger 
adjacent or surrounding region if the source image for the larger region is h all of the lists for the 
polygons in the smaller region. 

[0016] To improve performance of texture mapping in a computer, ima ;es that are the best 
source of texture for only a small number of polygons can be replaced with )ther image as the 
source of texture. Eliminating such images reduces the number of images a id the amount of 
memory storage required in a texture memory. Additionally, multiple imag* )s can be combined 
or blended to generate one or more texture image for storage and use in texl ire memory. 

[0017] Another embodiment of the invention identifies features in the v arious images and 
matches the features found in one image with corresponding features found n other images. The 
changes in the corresponding features of the various images indicate the difi srences in 
perspectives for the camera or cameras taking the images. Accordingly, a re construction engine 
can perform camera calibration based on the appearance or locations of the i latching features in 
multiple images. Once the camera positions are determined, the reconstruct on engine can 
construct the three-dimensional model of the object. 

[001 8] One particular method for reconstruction uses fundamental matrit es of two-view 
correspondences and trifocal tensors of three-view correspondences forprojt ctive reconstruction 
of a sparse 3D model. Additionally, a metric reconstruction based on the fiir damental matrices 
and two- view correspondences can generate a dense 3D model. The metric i ^construction 
generally provides a 3D model with appearance that is superior to the appear iince of the 
projective reconstruction, but metric reconstruction occasionally may fail for some sets of 
images. The combination of the projective reconstruction and the metric rec( instruction provides 
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a more robust 3D modeling engine since the projective reconstruction can 1 e used when the 
metric reconstruction fails. Additionally, calculations required for the two econstructions 
overlap so that the processing power requirements are not as large as requir .^d for two completely 
septate reconstructions. 

[0019] A reconstruction engine, which may be implemented in software , can perform a 
reconstruction processes that use a background pattern and/or features of tb ) images for 
reconstruction of a three-dimensional model of the object as described abo\ e. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0020] Fig. 1 is a flow diagram of a process usmg images containing a I nown background 
for generation of a three-dimensional model in accordance with an embodin tent of the invention. 

[0021] Figs. 2A and 2B illustrate altemative backgrounds for use in the orocess of Fig. 1. 

[0022] Figs. 3 A, 3B, 3C, and 3D represent images taken from different jerspectives of an 
object on a background. 

[0023] Fig. 4 is a flow diagram of a silhouette extraction process in acct rdance with an 
embodiment of the invention. 

[0024] Fig. 5 is a flow diagram of a surface reconstruction process in ac /Ordance with an 
embodiment of the invention. 

[0025] Fig. 6 is a flow diagram of a texture mapping process in accordai ice with an 
embodiment of the invention. 

[0026] Fig. 7 is a flow diagram of a process using images for generation of a three- 
dimensional model in accordance with an embodiment of the invention. 

[0027] Fig. 8 is a block diagram of a reconstruction engine in accordanc ; with an 
embodiment of the invention. 

[0028] Use of the same reference symbols in different figures indicates s imilar or identical 
items. 
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DETAILED DESCRIPTION 

[0029] In accordance with an aspect of the invention, a reconstruction e igine generates a full 
three-dimensional model and/or a panoramic view of an object without reqi iring a compUcated 
system for taking pictures from measured locations relative to the object, h stead, the 
reconstruction engine can use images of the object taken from ahnost any si t of viewpoints 
relative to the object. Based on the content of the images, the reconstructioi l engine calibrates 
each image for camera orientation and location and does not require measur ?ments of the 
orientation or location. 

[0030] One calibration technique uses images of the object on a known >ackground pattem. 
The background pattem contains marks with known dimensions that allow I le reconstruction 
engine to determme the camera parameters for the images. An alternative c Jibration technique 
identifies and matches features in the images and determines camera paramo ters from differences 
m the appearance of matching features in different images. 

[0031] Fig. 1 illustrates a process 100 for constructmg a three-dimensioi tal model of an 
object using images containing a known background pattem. Process 100 b sgins with a step 110 
of placing an object on, in front of, or near a background so that the backgro md appears in 
images taken of the object. The background has a known pattem can be a pi inted sheet 
containing several separated marks having identifiable points such as comer A person seeking 
to generate a three-dimensional model of an object can easily create such a \ ackground using a 
printer and with a color printer, can select the coloring of the background ao ;ording to the 
coloring of the object as described fiirther below. 

[0032] Figs, 2A and 2B illustrate exemplary backgrounds 200 and 250 h iving patterns with 
known geometry. In the illustrated embodiments, backgrounds 200 and 250 include a field 
having a field color and marks having a pattem color. The field color prefer, ibly differs from the 
coloring of the object and has a hue that simplifies identification of the objec t's shadow. Marks 
on tiie field color preferable have a pattem color such as black, which provid 3s a high contrast 
from the field color and the object's color. The size of the background 200 ( r 250 is preferably 
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selected according to the size of the object so that the object when placed ii the center of the 
background 200 or 250 does not cover any of the marks and so that each in' age will preferably 
contain at least three of the marks that are fully visible. 

[0033] Background 200 has four marks 21 1 to 214. A distance LI alon ? a first axis 
separates mark 21 1 fi-om mark 212 (and mark 213 fi-om 214), and a distano L2 along a second 
axis separates mark 211 from mark 213 (and mark 212 from mark 214). Ea zh of marks 21 1 to 
214 is L-shaped and consists of two rectangular segments having known dimensions A, B, C, 
and D. The asymmetry of each mark and tiie orientations of the marks as in background 200 
simplify identification of a reference axis for the image. Additionally, the u je of rectangular 
segments simplifies identification of the marks aad is convenient for expres dng transforms in 
Cartesian coordinates. The L-shape marks 21 1 to 212 provide several corrn rs that software can 
easily locate and distinguish in images. As described further below, the knc wn three- 
dimensional coordmates of the comer points and the measured two-dimensi mal coordinates of 
the comer points in an image can be plugged into formulae that provide the )rojection matrix (or 
inverse of the projection matrix) for the image. 

[0034] Background 250 has eight marks 2 1 1 to 2 1 4 and 22 1 to 224, whi ;h are also L-shaped. 
As a specific configuration for background 250, marks 21 1 to 214 and mark 1 221 to 224 are 
equally spaced at the perimeter of a 20L-by-20L square, where L can be any desfred unit of 
length. Each mark 21 1 to 214 is 4Lx4L in size with rectangular segments h; ving width 2L, and 
each mM-k 221 to 224 is 4Lx4L ui size with rectangular segments having width L. Witfi eight 
marks, background 250 provides more comers that will be visible in the ima jes (i.e., not 
obscured by the object). Additionally, in each image, multiple marks will gi nerally be separated 
from the silhouette of the object and therefore easily identified. Background 250 therefore 
provides for a robust determination of the projection matrices or transforms or the images. 

[0035] Pattem 200 and 250 of Figs. 2A and 2B are merely examples of s citable patterns for 
reconstruction of three-dimensional models. More generally, such patterns 1 ave unlimited 
possible variations of sizes, colors, and the shapes and separation of marks, ^or example, the 
L-shaped marks in the patterns can be replace with square or rectangular mai ks or marks having 
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any separated shapes that provides recognizable points such as comers in aj image. The spacing 
of the marks can be symmetric or asymmetric. 

[0036] With the object on the background, a user in step 120 (Fig. 1) ta^ :es several pictures of 
the object and background usmg different camera angles. Taking the pictur is does not require 
any special equipment for positioning of cameras or measuring the position of a camera when 
taking the picture. One camera can sequentially take all of the pictures. Th ; camera can be a 
handheld analog or digital camera, but generally, the processing techniques iescribed herein are 
easiest for processing of digital image data. Ideally, each image will includ< just the object and 
the background with at least three marks being fully visible. The images ar< preferably taken 
without using a flash that can cause the images to differ lighting and shadov s. 

[0037] Figs. 3A, 3B, 3C, and 3D illustrate four of the images of an objei 1 300 and 
background 200 from different camera locations. When taking the images, he camera can have 
any positions relative to the object, provided that the images collectively pn vide a diverse set of 
views of the object. Generally, eight or more images from points distributee around the object 
are sufficient, but more (e.g. up to 30) images may provide a more accurate hree-dimensional 
model of the object. More images are preferred for objects with complicate; shapes. 
Accordingly, three-dimensional reconstruction does not require the large vm iber of images or a 
correspondingly powerful processing system that some other three-dimensio lal reconstruction 
processes require. 

[0038] Once the images are taken (and digitized if necessary), the recom tmction engme 
processes digital image data to construct a three-dimensional model that the econstruction 
engine can use when rendering views of the object. The reconstruction engii le is generally 
implemented in software that can be executed on a conventional personal coi tiputer with a clock 
speed of about 500 MHz or more, 

[0039] hi process 1 00 of Fig. 1 , image processing begins in step 1 30 witl i image analysis that 
extracts from each image the silhouette of the object, the separated mark regi 3ns, and the 
locations of reference points associated with the mark regions. For silhouettf extraction, the 
reconstruction engine examines each pixel in each image to detennine wheth ar the pixel 
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represents part of the object, part of the background pattern, or part of the b ickgroimd field. 
Generally, the colors may vary somewhat in different images because of dif ferences in lighting 
and/or differences in cameras used for the different images. Additionally, a shadow of the object 
may lie on a portion of the background in an image. 

[0040] A histogram can typically identify the colors corresponding to tli a field and pattern 
colors in the background, hi particular, a histogram of the colors in an ima^ e will have peaks 
corresponding to the nearly uniform field and pattem colors. The field and )attem colors can 
also be identified or distinguished as corresponding to peaks near the expec ed field and pattem 
colors, fi-om the location of the color near the edges of the image, and fi-om miformly colored 
shapes having proportions consistent with the known shapes and relative lo( ations of marks in 
the pattem. 

[0041] The silhouette of the object in an image includes the portion of tl le image that does 
not match the field or pattem color of the background. Accordingly, a back ,»3-ound having field 
and pattem colors that differ fi*om the coloring of the object may provide th< best results for 
silhouette extraction. However, portions of the object that match the field o ' pattem color of the 
background can be identified from the known shape of the pattem, and addi ional image 
processing of multiple images can help distinguish points of the object that 1 lappen to have the 
field or pattem color. 

[0042] Fig. 4 is a flow diagram of a silhouette extraction process 400 in accordance with an 
embodiment of the invention. Silhouette extraction process 400 operates on an image 
represented by a pixel map with each pixel in the image having correspondii .g Red-Green-Blue 
(RGB) values that indicate the color of the pixel To simplify shadow detec ion and adjustments 
for differences in lighting or contrast, step 410 of process 400 transforms th; RGB values of the 
pixel map to HSV space and determines a hue for each pixel. Equations 1 hidicate a standard 
transformation from RGB space to HSV space and a hue value HUE for a pi ^el having RGB 
values R, G, and B. 

Equations 1: II =(R + G + B)/3 
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EZ-logR-log 
13 = ^2 log RG-logB 

^V3(G-B)^ 



HUE = arctan 



2R-G-B 



[0043] Step 420 detects color edges in the image. In one embodiment, \ tep 420 performs a 
Gaussian smoothing operation on the RGB pixel map and then calculates th 5 gradient of the 
color values 12 and D found from the smoothed pixel map. Equations 2 ind icates the values of 
gradients GO, G45, G90, and G135 of color value I (I = 12 or 13) along resp( ::tive directions 0°, 
45'', 90*", and 135"* through a pixel having coordinates x and y in the pixel m ip. 

Equations 2: 

GO(x,y) = I(x-l,y+I) + 2I(x,y-l) + I(x+l,y+l) ^ I(x-l,y-l) - 2I(x,y-l) - I(x4 l,y-l) 
G90(x,y) = I(x+l,y-l) + 2I(x+l,y) + I(x+l,y+l) - I(x-l,y-l) - 2I(x-l,y) - -l,y+l) 
G45(x,y) = I(x+l,y) + 2I(x+l,y+l) + I(x,y+1) - I(x-l,y) - 2I(x-l,y-l) - I(x,> 4) 
G135(x,y) = I(x+l,y) + 2I(x+,y-l) + I(x,y-1) - I(x-l,y) - 2I(x4,y+l) - I(x,y 1) 

[0044] Different materials in the image generally cause a change in cole • values 12 and 13 so 
that a color edge corresponding to a local maximum in the gradient image o ten marks an edge 
between different materials. Both the color edge of RGB values and the ed^ e of color values 12 
and 13 indicate initial edges for the object and the marks in the pattem. 

[0045] After determination of initial color edges in the image, step 430 ^ egments the image 
into regions by color. Segmentation can be performed in three steps. A hist.)gram-based 
segmentation to segment each image into several regions with each region containing pixels that 
fall into the same color bin. Each region is divided into separate regions if ttie color gradients 
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indicated an edge within the region, and connected regions that are separate: bins but have similar 
colors are merged together. 

[0046] Step 440 detects pattern marks by identifying regions having a c )lor, surroundings, 
and a shape consistent with the known pattern. In particular, dark regions a e candidate regions 
for marks when the pattem color is black. The shape of the candidate regio i is compared to the 
pattern via a re-projection method, which projects the pattem onto the imag j plane and then 
compares the positions of the projected vertices with those detected for the ( candidate region. 

[0047] Step 450 determines a coarse silhouette for the object by first rei loving the confirmed 
mark regions of the pattem and regions having the field color. The remainii ig regions may 
g contain the object, shadow, and pattem mark regions connected to the objec . Step 450 further 
uses the detected mark regions to identify and remove the unidentified mark regions that may be 
connected to the objects silhouette. 



[0048] Step 460 removes the shadow of the object fi-om the remaining r( igion. The shadow 
generally contains a portion of the background having less lighting. If the si adow is not too 
dark, the hue of a shadow region will be the same as the hue of the backgroi nd. Accordingly, 
regions having the same hue as the background are removed from the siUiou 5tte of the object. 
For darker regions, color values 12 and B are used to distinguish the shadoA^ fi:om the object's 
silhouette. After removing the shadow, the remaining region is the silhouett 3 of the object in the 
image. 

[0049] Returning to Fig. 1, a result of identifying the object's silhouettes in step 130 (and 
process 400) is identification of a bit map that indicates whether each pixel j j part of the object 
(i.e., part of the silhouette) or part of the background. Process step 130 also dentifies regions 
corresponding to marks in the background pattem and determines the image coordinates X and Y 
(e.g., column and row coordinates in the pixel map) of comers or other distil ctive calibration 
points of the marks. Optionally, the silhouette for an image and the regions dentified as the 
marks in the background can be overlaid on the image presented to a user foi possible 
modification. With this option, the user improves the silhouette or mark reg on if the automated 
analysis erred. 
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[0050] Given the silhouettes of the object and the image coordinates of :he calibration points 
in the images, step 140 determines the camera p^ameters of the images. In particular, for each 
image, a camera calibration process determines a projection matrix R and a translation vector T 
for the image. The projection matrix R and a traaslation vector T are parts )f a transform from 
three-dimensional world coordinates (xw, yw, zw) to 3-dimensional camera coordinates (x, z) 
as defined in Equation 3. 





fx) 








^rl r2 rS' 








f^x^ 


Equation 3: 
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= R 


yw 


+ T = 


r4 r5 r6 
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Ty 












^r7 r8 r9^ 











[0051] The camera coordinates (x, y, z) have a z-axis along the optical j xis of the camera, so 
that Equations 4 give the image coordinates X and Y in terms of the camera coordinates and the 
effective focal length f of the camera. 



Equations 4: 



X = f.- 



Y = f 



[0052] Using a world coordinate system having the background in the p ane of the xw and 
yw axes (i.e., zw=0 for all points in the background), coordinates xw and yvi are known for each 
calibration point (e.g., comer point) in the background. The image coordina ,es X and Y for 
calibration points are known from step 130. From Equations 3 and 4, it can ^e shown that 
components of projection matrix R and translation vector T satisfy Equatior 5 for each 
calibration point. 
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Equations: (xw-Y yw- Y -xw-X -yw- Y) 



^ rl/Ty^ 

r2/Ty 

Tx/Ty 
r4/Ty 

r5/Ty^ 



= X 



[0053] Accordingly, a set of more than five calibration points identified in an image provides 
an overdetermined system of Equations (i.e., Equation 5 repeated for each icentified calibration 
point). Each separate L-shaped mark in the pattern of Fig. 2A or 2B provid* s six calibration 
points and therefore six equations. Step 140 solves system of equations usii g conventional 
methods to determine values rl ', i2', r4', r5', and Tx' of Equations 6. 

Equations 6: rl '=rl/Ty; r2'=r2/Ty; r4'=r4/Ty; r5 '=r5/TY, Tx'=Tx/T3r, 



[0054] Projection matrix R being an orthonormal matrix implies that tra nslation component 
Ty is given in Equations 7 and projection matrix R is given in Equation 8. 1 1 Equation 8, rl , r2, 
r4, and r4 can be determined firom Equations 6 and 7, S is -sign(rlr4+r2r5), ; md components r7, 
r8, and r9 can be determined firom the outer product of the first two rows of he projection 
matrix. 



P ^ „ rj,2 Sr-[Sr^ -4(rV-r5'-r4''r2')}'^ 
Equations 7: Ty^ = — — i r — iJ — • 

2(rlV5'-r4V2') 



Sr = rl'^ + r2'^- r4'^ + r5'^ 



Equations: R = 



rl rS r9 



iff>0 
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R = 



rl rl 
rA r5 
-rl rS 



-Syll-r4^ -r5^ 
r9 



iff<0 



m 



[0055] The effective focal length of the camera taking the image and thk z component Tz of 
the translation vector T can be determined by solving a system of equations found by inserting 
the known image coordinate Y and world coordinates xw and yw for multip e calibration points 
into Equation 9. 



Equation9: (r4- xw + r5 • yw + Ty -Y) 



Tz 



= Y-(r7-xw + r8-yw) 



[0056] Step 140 using the above-described techniques can determine foi each image a 
transform from three-dimensional world coordinates xw, yw, and zw to imaj ,e coordinates X and 
Y. Step 150 uses the transform and object silhouette for one or more of the mages to generate a 
dense set of points within an approximation of the object's volume. Volume, generation in step 
150 starts with one of the images and a candidate volume, e.g., a volume cor taining points 
having world coordinates xw, yw, and zw such that 0<xw<xwmax, 0<yw<yi'/max, and 
0<zw<zwmax. The marks in the background can define boundaries xwmax ind ywmax of the 
candidate volume, and the boundary zwmax can be user selected or set according to xwmax and 
ywmax. 

[0057] Volume generation step 150 applies the transformation found for an image to a dense 
set of candidate points in the candidate volume for the image. Every point th at the transform 
maps onto the silhouette for the object is considered a point in an approxima^ e volume of the 
object. Optionally, the approximate volume thus found can be used as a can<sidate volume for 
another of the images to further refine the approximation of the volume of th ; object. The 
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process can be repeated until step 150 finds the volume points in suitable re ined approximate 
volume. 

[0058] Step 160 constructs a three-dimensional model of the surface of i tie object using the 
volume points from step 150 and a reconstruction process such as process 5i >0 of Fig. 5. In 
process 500, an initial step 510 selects a set of volume points for surface rec mstruction. As 
noted above, step 150 finds a dense set of the volume points, e.g., with volui ae points 
approximately as dense as pixels in the images. An exemplary embodiment of the invention 
reduces required processing power by selecting a subset of the volume point -j by sampling the 
volume points to generate a subset of volume points distributed uniformly tl roughout the object 
surface. 

[0059] Steps 520, 530, and 540 identify the surface of the object and cai use known 
techniques unorganized points such as described in ''Surface Reconstruction from Unorganized 
Points'\ Hoppe et al., Siggraph 92' Proceedings, page 71-78, which is hereb r incorporated by 
reference in its entirety. In particular, step 520 partitions the selected volum ? points for 
construction of a local connectivity graph. Identifying closest neighboring p )ints generally 
requires calculation of the distances between points, but this brute force mel lod requires a large 
amount of processing power. Accordingly, the volume points are hashed in! d different buckets 
based on their three-dimensional coordinates. In particular, points in the sar le cube are in the 
same bucket. Smaller cubes can be used to reduce the number of candidate j loints that need to be 
considered when identifying the closest neighboring points, but more bucket 5 may need to be 
search to find a desired number of points. 

[0060] Once local neighborhood of the specified points are known, set 5 50 identifies an 
approximate tangent plane for each of the surface points. A determination o ^the tangent plane 
for a point can include calculating the centroid of the neighboring points and the co variance 
matrix. The Jacobi method can determine the eigenvector of the covariance natrix to get an 
estimated normal to the surface at the point. 

[0061] Step 540 samples three-dimensional space with evenly distribute i sample points and 
determines the distance of these points to the surface defined by the tangent )lanes found in step 
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530. The spacing of the sample points determines the granularity or size of ; urface triangles in 
the reconstructed three-dimensional model. To determine the distance for a sample point A, the 
volume point A' closest to the sample point A is identified and a distance is computed from a dot 
product to the normal vector at volume point A'. The sign of the distmce, v hich is negative for 
points inside the volume and positive for points outside the volume, can be « etermined using a 
minimum spanning tree and cost function as Hoppe et al. describe. Altemal vely, determining 
whether a point is one of a volume points from the volume generation (step 50 in Fig. 1) 
indicates the sign and requires less processing power. Calculation of some < f the signed 
distances can be avoided to further reduce required processing power require ments. hi 
particular, the marching cubes technique only requires accurate distances foi points near the 
surface of the object, and points far from the surface can be given a sign and an arbitrary 
distance. 

[0062] Step 550 uses a marching cubes technique to reconstruct the surf ace of the object. 
Step 560 then constructs a mesh of polygons (e.g., triangles) that corresponc to portions of the 
surface of the object. Marching cubes techniques for construction of polygo a meshes are known 
in the art and described, for example, by Schroeder et al, "The Visualizatior Toolkif\ 
Edition, Prentice Hall ©1998, which is hereby incorporated by reference in ts entirety. 

[0063] To permit a users input to the surface reconstruction, the 3D recc nstruction engine 
can provide a rendering of the polygon mesh and permit a user to view and i lodify the polygon 
mesh. The user can for exampled change the location of vertices and add or remove polygons to 
improve the accuracy or appearance of the polygon mesh. In a similar fashi< »n, the user can edit 
the results of silhouette extraction and/or edit the results of texture mapping Further, if at any 
point the user is not satisfied with the appearance of the reconstruction, the user can add another 
image for the analysis. The object's silhouette and the locations of the calib -ation points in the 
added image can be determined as indicated above (steps 130 and 140), and the transform 
determined from the calibration points can be appUed to volume points in tb 3 approximate 
volume of the object to fijrther refine the approximate volume as described or step 150 of 
process 100. Accordingly, a user can incrementally refine the 3D polygonal model by adding 
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pictures until the user is satisfied with the result. 

[0064] Returning to Fig. 1, a step 170 performs texture mapping of the f olygons in the mesh 
from step 160. Texture mapping 170 assigns colors or textures to the polyg< >ns according to the 
color or texture of the object as shown in the images. 

[00651 Fig- 6 is a flow diagram of a texture mapping process in accordai ce with an 
embodiment of the invention. Texture mapping process 600 includes a cont -ast adjustment step 
610 that adjusts color values associated with pixels in the images. La partici lar, contrast 
adjustment step 610 compares the contrast in images having adjacent perspe :tives, i.e., camera 
positions closest to each other and adjusts the color values to smooth the con trast change from 
one image to the next. Contrast adjustment 610 can compensate for some o the differences in 
the images that differences in hghting may have caused. 

[0066] For each triangle in a mesh representing the surface of the objecl step 620 constructs 
an image list for the triangle. Ideally, the image Hst for a triangle contains tl e images that could 
provide the texture for the triangle, and the images in the list are ranked fror i the best source of 
texture to the worst. One technique for constructing the image Ust begins b;; ordering the images 
according to the angle between the camera's axis as determined during cann ra calibration and 
the direction of the normal vector of the triangle. In particular, candidate in ages that are initially 
in the list for a triangle have a rank according to how parallel the optical axi of the camera that 
took the image is to an inward normal of the triangle under consideration. / n image is removed 
from the Hst for a triangle if a projection of the triangle into the image has a Dortion outside the 
object's silhouette in the image. Further, to solve the problem of a triangle tiat is partially or 
fiiUy obscured or blocked in a particular image, the image is removed from t He triangle's Ust if 
the projection of the triangle into the image overlaps the projection of any tr angle that is closer 
to the camera. 

[0067] In an alternative method for ranking images as sources of texture the 3D 
reconstruction engine allows a user to view texture that the images provide I or each triangle, and 
for each triangle, the user can rank the images according to the subjectively-; tudged appearance 
of the different textures for the triangle. 
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[0068] Once a list of candidate images has been identified for each trian ;le, step 630 
separates the triangles into groups of adjacent triangles, where each triangle n a group has the 
same image as the best source for texture. Step 640 then examines the grou) )S to identify a small 
group of connected triangles that are surrounded by a larger group of triangles. Step 640 
switches the image that is initially the source of texture for the small group ( f triangles to the 
same image that is the source of texture for triangles in the surrounding groi. p if the source 
image for the surrounding group is in each of the image lists for the triangle: in the smaller 
group. This change in texture source reduces artifacts the might appear in n ridered images at the 
boundaries between triangles having different images as source for texture. 

[0069] Texture mapping processes for rendering an image from a three-n imensional model 
are well known in the art and commonly implemented in a 3D software pad age (e.g., OpenGL) 
and a video card that implements or supports the software package. Using 21 large number of 
images for texture requires a large texture memory and generally slows dow i the rendering 
process. To improve performance, step 650 removes any image that is a tex ure source for just a 
few triangles and uses other images from the image lists as the texture sourc ?s for those 
triangles. The removal of any of the images in step 650 reduces the require< amount of texture 
memory and correspondingly improves performance during rendering of ima ges. 

[0070] Step 660 further improves texturing performance by constructing! one or more texture 
images. Step 660 uses multiple images in the construction of a texture imag 3. The construction 
process of step 660 identifies regions in the multiple images that actually pn: vide texture for 
triangles in the three-dimensional model. Step 660 maps those regions (e.g. tri^gles) in the 
source images mapped to regions in the texture image. A smoothing or refii ing process for the 
texture image blends colors from images at boundaries in the texture image < ;orresponding to 
different images. The blending reduces artifacts that could otherwise result dt the boundaries 
between triangles. One texture image replaces multiple original images, the^^eby reducing the 
required amount of texture memory. 

[0071] Once each triangle has an assigned image (or texture image), knc wn rendering 
techniques maps the coloring from the assigned image to the triangle. The r xonstruction engine 
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can thus use the three-dimensional model of the object with the detemiined extures to generate 
any desired views of the object or to generate a series of views that form a p morama of the 
object. As described above, if the user is not satisfied with the end result, tl e 3D reconstruction 
engine can allow the user to modify the 3D model or allow the user to add a lother image of the 
object for analysis in generating a refined 3D model. 

[0072] The reconstructed 3D model can be exported for use in other sof ware. In particular, 
the 3D model can be converted to Arcsoft 3D file format or any available fo mat for 3D models. 
Additionally, the 3D reconstruction engine or the format conversion process can modify the 
complexity of the 3D model (e.g., the number of geometric primitives such as triangle, edge, 

5^ vertex, etc.) to meet the quality or rendering speed requirements of a particu ar user or software 

Q package. 

0 [0073] Fig. 7 is a flow diagram of a panorama construction process 700 n accordance with 
flll another embodiment of the invention. Process 700 begins with a step 710 o "taking several (e.g., 
8 to 12) images of an object firom different perspectives. Unlike process 10( described above, no 
special background is in the images, and process 700 performs camera caUb ation based on 
features of the object in the images. 

[0074] Step 720, analyzes each image to identify features of the object s ich as comers. Step 
730 then considers pairs of the images and for each pair matches the identifi 5d features in the 
two images. In particular, step 730 recursively determines a fimdamental matrix that maps the 
features of one image in the pair of images to the features of the other image in the pair. Each 
iteration of the recursive operation adjusts the coefficients of the fiindamentiii matrix to improve 
the number of correspondences of the features in the two images, 

[0075] Step 740 considers triplets of images and compares the features in three images. The 
technique recursively determines a trifocal tensor for associating features of one image with 
matching corresponding in the other two images. 

[0076] Step 750 reconstructs a projective three-dimensional model of tin j object using the 
fimdamental matrices and the trifocal tensors. Paul Beardsley, Phil Torr, an< 1 Andrew 
Zisserman, "3D Model Acquisition from Extended Image Sequences'\ Proc. I-* European 
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Conference on Computer Vision, LNCS 1065, Cambridge, pages 687-695 ( 1 996) and R. Hartley 
and A. Zisserman, "Multiple View Geometry"" describe suitable techniques fi >r generating a 
projective three-dimensional model as in step 750 and are hereby incorporat >d by reference in 
their entirety. 

[0077] Projective three-dimensional models are subject to distortion of ttie shape of the 
object. Accordingly, step 760 constructs a metric three-dimensional model 1 aat in many cases 
provides a more accurate representation of the surface of an object PoUefeys et al., "Self 
Calibration and Metric Reconstruction in Spite of Varying and Unknown In, rinsic Camera 
Parameters'', Litemational Joumal of Computer Vision, (1998) and M. Poll* feys, "SIGGRAPH 
2001 - Course Notes obtaining 3D Models with a Hand-held Camera"" desc ibe suitable 
techniques for generating a metric reconstruction of the surface of an object md are hereby 
incorporated by reference in their entirety. 

[0078] In accordance with an aspect of the invention, process 700 in stej 770 compares the 
projective and metric three-dimensional models of the object and selects or i onstructs a three- 
dimensional model for use. The metric three-dimensional model generally i rovides a more 
accurate model of the objects surface and is selected in step 770. However, for some sets of 
images, metric reconstruction fails to provide reasonable results, and step Ti 0 selects the 
projective three-dimensional model. This capabiUty makes process 700 moi e robust in 
determining three-dimensional models of objects. 

[0079] Texture can be added to the mesh of polygons that process 700 c mstructs. In 
particular, texture mapping process 600 of Fig. 6 can provide texture to the i tiree-dimensional 
model found in step 770. 

[0080] Fig. 8 illustrates a reconstruction engine 800 in accordance with m embodiment of 
the invention. Reconstruction engine 800, which woxild typically be implen»ent in software 
executed by a general-purpose computer, includes software procedures or m its that implement 
alternative data paths for the three-dimensional reconstruction processes 10( and 700 of Figs. 1 
and 7. In particular, reconstruction engine 800 includes a silhouette extracti )n unit 810, a 
volume generation unit 820, a sparse surface reconstruction unit 830, and a exture mapping unit 
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840 used to perform steps 130, 150, 160, and 170 in the three-dimensional r jconstmction 
process 100 of Fig. 1. 

[0081] Sparse surface reconstruction unit 830 and a texture mapping uni 1 840 are used with a 
feature finding unit 850, a fund^ental matrix and two-view unit 855, a trifi »cal tensor and three- 
view correspondence unit 860, a projective reconstruction unit 865, a metric reconstruction unit 
870, an image rectification unit 875, a two view dense stereo estimation unii 880, a multi-view 
dense correspondence unit 885, and a dense surface reconstruction unit 890 n implementing the 
reconstruction process 700 of Fig. 7. 

[0082] Reconstruction engine 800 initially sends an input series of imag )s of an object to 
5 silhouette extraction unit 810 or feature finding unit 850 depending of the pi ocess selected for 
construction of a three-dimensional model of the object. In some cases, bofi processes can be 
'^1 used on the same set of images, and the resulting three-dimensional models )f the object can be 

'"raw 

compared or combined to improve the accuracy of the three-dimensional mc deL 

* [0083] The computer program listing appendix includes a source code d ascription of portions 
14 of reconstruction engine 800. hi particular, files Segment.h and Segment.cp > illustrate 
% techniques used in an embodiment of a silhouette extraction process. Files ^lumeWP.h and 
Volume WP.cpp illustrate techniques used in an embodiment of a volume ge aeration process. 
Files BasicDef h, Camera.h, Camera.cpp, Texlmage.h, Teximage.cpp, TexMap.h, TexMap.cpp, 
TexMesh.h, TexMesh.cpp illustrate techniques for an embodiment of 3D po ygonal model 
construction and texture mapping. 

[0084] Although the invention has been described with reference to part cular embodiments, 
the description is only an example of the invention's application and should lot be taken as a 
limitation. For example, although the above description describes imaging c f an object, the term 
object is not intended to exclude subjects such as people or animals, which c auld also be the 
object of images and image processing described above. Various other adap ations and 
combinations of features of the embodiments disclosed are within the scope 3f the invention as 
defined by the following claims. 



-23- 



ARC002US 



